signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...

, a kernel adaptive filter is a type of nonlinear adaptive filter. An

adaptive filter An adaptive filter is a system with a linear filter that has a transfer function controlled by variable parameters and a means to adjust those parameters according to an optimization algorithm. Because of the complexity of the optimization algorit ...

is a filter that adapts its

transfer function In engineering, a transfer function (also known as system function or network function) of a system, sub-system, or component is a mathematical function that theoretically models the system's output for each possible input. They are widely used ...

to changes in signal properties over time by minimizing an error or loss function that characterizes how far the filter deviates from ideal behavior. The adaptation process is based on learning from a sequence of signal samples and is thus an

online algorithm In computer science, an online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start. In contrast, an o ...

. A nonlinear adaptive filter is one in which the transfer function is nonlinear. Kernel adaptive filters implement a nonlinear transfer function using

kernel methods In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example ...

. In these methods, the signal is mapped to a high-dimensional linear

feature space In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...

and a nonlinear function is approximated as a sum over kernels, whose domain is the feature space. If this is done in a

reproducing kernel Hilbert space In functional analysis (a branch of mathematics), a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Roughly speaking, this means that if two functions f and g in ...

, a kernel method can be a universal approximator for a nonlinear function. Kernel methods have the advantage of having convex loss functions, with no local minima, and of being only moderately

complex Complex commonly refers to: * Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe ** Complex system, a system composed of many components which may interact with each ...

to implement. Because high-dimensional feature space is linear, kernel adaptive filters can be thought of as a generalization of linear adaptive filters. As with linear adaptive filters, there are two general approaches to adapting a filter: the

least mean squares filter Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal (difference between the desired and the actual ...

(LMS) and the

recursive least squares filter Recursive least squares (RLS) is an adaptive filter algorithm that recursively finds the coefficients that minimize a weighted linear least squares cost function relating to the input signals. This approach is in contrast to other algorithms such ...

(RLS). Self organising kernel adaptive filters that use iteration to achieve convex LMS error minimisation address some of the statistical and practical issues of non-linear models that do not arise in the linear case. Regularisation is particularly important feature for non-linear models and also often used in linear adaptive filters to reduce statistical uncertainties. However because nonlinear filters typically have a much higher potential structural complexity (or higher dimensional feature space) compared to the subspace actually required, regularisation of some kind must deal with the under-determined model. Though some specific forms of parameter regularisation such as prescribed by Vapink's SRM & SVM address the dimensionality problem statistically to some extent, there remain further statistical and practical issues for truly adaptive non-linear filters. Adaptive filters are often used for tracking the behaviour of a time-varying system or systems which cannot be fully modelled from the data and structure available, hence the models may not only need to adapt parameters, but structure too. Where structural parameters of kernels are derived directly from data being processed (as in the above "Support Vector" approach) there are convenient opportunities for analytically robust methods of self organisation of the kernels available to the filter. The linearised feature space induced by kernels allows linear projection of new samples on to the current structure of the model where novelty in new data can be easily differentiated from noise-born errors which should not result in a change to model structure. Analytical metrics for structure analysis can be used to parsimoniously grow model complexity when required or optimally prune the existing structure when processor resource limits are reached. Structure updates are also relevant when system variation is detected and the long-term memory of the model should be updated as for the Kalman Filter case in linear filters. Iterative gradient descent that is typically used in adaptive filters has also gained popularity in offline batch-mode support vector based machine learning because of its computational efficiency for large data set processing. Both time series and batch data processing performance is reported to be able to easily handle over 100,000 training examples using as little as 10kB RAM. Data sizes this large are challenging to the original formulations of support vector machines and other kernel methods, which for example relied on constrained optimisation using linear or quadratic programming techniques.

References

{{reflist Digital signal processing Nonlinear filters Kernel methods for machine learning